SIMD Parallel Sparse Matrix-Vector and Transposed-Matrix-Vector Multiplication in DD Precision
نویسندگان
چکیده
We accelerate a double precision sparse matrix and DD vector multiplication (DD-SpMV), and its transposition and DD vector multiplication (DD-TSpMV) by using SIMD AVX2 for Krylov subspace methods. We compare some storage formats of DD-SpMV and DDTSpMV for AVX2 to eliminate performance degradation factors in CRS. Our experience indicates that BCRS4x1, with fitting block size to the SIMD register’s length, is effective.
منابع مشابه
AVX Acceleration of DD Arithmetic Between a Sparse Matrix and Vector
High precision arithmetic can improve the convergence of Krylov subspace methods; however, it is very costly. One system of high precision arithmetic is double-double (DD) arithmetic, which uses more than 20 double precision operations for one DD operation. We accelerated DD arithmetic using AVX SIMD instructions. The performances of vector operations in 4 threads are 51-59% of peak performance...
متن کاملBreaking the performance bottleneck of sparse matrix-vector multiplication on SIMD processors
The low utilization of SIMD units and memory bandwidth is the main performance bottleneck on SIMD processors for sparse matrix-vector multiplication (SpMV), which is one of the most important kernels in many scientific and engineering applications. This paper proposes a hybrid optimization method to break the performance bottleneck of SpMV on SIMD processors. The method includes a new sparse ma...
متن کاملA Simd Sparse Matrix-vector Multiplication Algorithm for Computational Electromagnetics and Scattering Matrix Models
Kipadia, Nirav Harish. M.S.E.E., Purdue University. May 1994. Pi SIMD Sparse Matrix-Vector Multiplication Algorithm for Computational Electromagnetics and Scattering Matrix Models. Major Professor: Jose Fortes. A large number of problems in numerical analysis require the multiplication of a sparse matrix by a vector. In spite of the large amount of fine-grained parallelism available in the proc...
متن کاملEfficient multithreaded untransposed, transposed or symmetric sparse matrix-vector multiplication with the Recursive Sparse Blocks format
In earlier work we have introduced the “Recursive Sparse Blocks” (RSB) sparse matrix storage scheme oriented towards cache efficient matrix-vector multiplication (SpMV ) and triangular solution (SpSV ) on cache based shared memory parallel computers. Both the transposed (SpMV T ) and symmetric (SymSpMV ) matrix-vector multiply variants are supported. RSB stands for a meta-format: it recursively...
متن کاملDirect and Transposed Sparse Matrix-Vector Multiplication
In this paper we investigate the execution of Ab and AT b, where A is a sparse matrix and b a dense vector, using the Blocked Based Compression Storage (BBCS) scheme and an Augmented Vector Architecture (AVA). In particular, we demonstrate that by using the BBCS format, we can represent both the direct and the transposed matrix for the purposes of matrix-vector multiplication with no additional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016